Java Threads & Concurrency: Understanding OS-Level Implementation

Thread Fundamentals
OS-Level Thread Management
CPU and Thread Execution
Memory and RAM Interaction
Practical Examples

Thread Fundamentals

What is a Thread?

A thread is the smallest unit of execution that can be scheduled by the operating system. Think of it as a lightweight process that shares memory with other threads in the same process.

Key Concept: When you create a Java thread, you're actually requesting the OS to create a native thread.

// Simple thread creation
Thread thread = new Thread(() -> {
    System.out.println("Running in: " + Thread.currentThread().getName());
});
thread.start(); // This triggers OS-level thread creation

OS-Level Thread Management

The Journey from Java to OS

When you call thread.start() in Java, here's what happens underneath:

Java Application (JVM)
         ↓
    JVM Thread API
         ↓
    Native Thread Library (pthreads on Linux, Windows Threads on Windows)
         ↓
    Operating System Kernel
         ↓
    Scheduler assigns thread to CPU core

Thread Models

1:1 Model (Java uses this)

One Java thread = One OS thread
Each Java thread maps directly to a kernel thread

public class ThreadMappingExample {
    public static void main(String[] args) {
        // Creating 3 Java threads = 3 OS threads
        for (int i = 0; i < 3; i++) {
            Thread t = new Thread(() -> {
                System.out.println("Thread ID: " + Thread.currentThread().threadId());
                System.out.println("Native thread ID: " +
                    ProcessHandle.current().pid());
            });
            t.start();
        }
    }
}

CPU and Thread Execution

How CPU Executes Threads

Single Core CPU:

Time Slice 1: Thread A executes
Time Slice 2: Thread B executes (context switch)
Time Slice 3: Thread A executes (context switch)
Time Slice 4: Thread C executes (context switch)

Multi-Core CPU:

Core 1: Thread A
Core 2: Thread B    } All execute simultaneously
Core 3: Thread C
Core 4: Thread D

Context Switching

When the OS switches from one thread to another, it must:

Save current thread state (registers, program counter, stack pointer) → RAM
Load next thread state from RAM → CPU registers
Resume execution

public class ContextSwitchExample {
    public static void main(String[] args) {
        // With 1000 threads on 8 cores, expect lots of context switching
        for (int i = 0; i < 1000; i++) {
            new Thread(() -> {
                // CPU time slicing happens here
                for (int j = 0; j < 1000000; j++) {
                    Math.sqrt(j); // CPU-intensive work
                }
            }).start();
        }
    }
}

Cost of Context Switching:

Save/restore CPU registers: ~1-2 microseconds
Cache invalidation (CPU cache needs to reload data)
TLB (Translation Lookaside Buffer) flush

Memory and RAM Interaction

Thread Memory Layout

Each thread has:

┌─────────────────────────────────┐
│     PROCESS MEMORY SPACE        │
├─────────────────────────────────┤
│  Heap (Shared by all threads)   │ ← Objects created with 'new'
├─────────────────────────────────┤
│  Method Area (Shared)            │ ← Class metadata, static variables
├─────────────────────────────────┤
│  Thread 1 Stack (Private)        │ ← Local variables, method calls
├─────────────────────────────────┤
│  Thread 2 Stack (Private)        │
├─────────────────────────────────┤
│  Thread 3 Stack (Private)        │
└─────────────────────────────────┘

Memory Visibility Problem

public class MemoryVisibilityExample {
    // Without volatile, changes might not be visible across threads
    private static boolean flag = false;

    public static void main(String[] args) throws InterruptedException {
        // Thread 1: Reads flag
        Thread reader = new Thread(() -> {
            while (!flag) {
                // CPU might cache 'flag' value in register
                // Never reads updated value from RAM!
            }
            System.out.println("Flag is now true!");
        });

        // Thread 2: Writes flag
        Thread writer = new Thread(() -> {
            try {
                Thread.sleep(1000);
                flag = true; // Written to CPU cache, maybe not RAM yet
                System.out.println("Flag set to true");
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        });

        reader.start();
        writer.start();
    }
}

CPU Cache and Memory Hierarchy

CPU Core
  ├─ L1 Cache (32-64 KB, ~1 ns access)
  ├─ L2 Cache (256 KB, ~3 ns access)
  └─ L3 Cache (Shared, 8-32 MB, ~12 ns access)
        ↓
   Main RAM (GB, ~100 ns access)

Why This Matters:

public class CacheCoherenceExample {
    private static int sharedCounter = 0;

    public static void main(String[] args) throws InterruptedException {
        Thread t1 = new Thread(() -> {
            for (int i = 0; i < 100000; i++) {
                // Core 1 reads sharedCounter into its cache
                // Increments it
                // Writes back (eventually)
                sharedCounter++;
            }
        });

        Thread t2 = new Thread(() -> {
            for (int i = 0; i < 100000; i++) {
                // Core 2 also reads sharedCounter into its cache
                // Both cores have different cached values!
                sharedCounter++;
            }
        });

        t1.start();
        t2.start();
        t1.join();
        t2.join();

        // Expected: 200000, Actual: Less (lost updates)
        System.out.println("Counter: " + sharedCounter);
    }
}

Practical Examples

Example 1: CPU-Bound Task

public class CPUBoundExample {
    public static void main(String[] args) {
        int cores = Runtime.getRuntime().availableProcessors();
        System.out.println("CPU Cores: " + cores);

        // Creating threads = number of cores is optimal for CPU-bound tasks
        for (int i = 0; i < cores; i++) {
            Thread t = new Thread(() -> {
                // This thread gets a dedicated core
                long sum = 0;
                for (long j = 0; j < 1_000_000_000L; j++) {
                    sum += j;
                }
                System.out.println("Sum: " + sum);
            });
            t.start();
        }
    }
}

What Happens:

JVM creates 8 threads (on 8-core CPU)
OS scheduler assigns 1 thread per core
Each core executes its thread with minimal context switching
CPU utilization: ~100%

Example 2: I/O-Bound Task

public class IOBoundExample {
    public static void main(String[] args) {
        // I/O-bound: Can create many more threads than cores
        for (int i = 0; i < 1000; i++) {
            Thread t = new Thread(() -> {
                try {
                    // Thread blocks, OS removes from CPU
                    Thread.sleep(1000); // Simulates I/O wait
                    // Thread wakes, OS schedules back to CPU
                    System.out.println("Done waiting");
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            });
            t.start();
        }
    }
}

What Happens:

Thread calls sleep() → moves to WAITING state
OS removes thread from CPU scheduler
CPU is free for other threads
After sleep, thread moves to RUNNABLE → OS schedules it back

Example 3: Proper Synchronization

public class SynchronizedExample {
    private static int counter = 0;
    private static final Object lock = new Object();

    public static void main(String[] args) throws InterruptedException {
        Thread t1 = new Thread(() -> {
            for (int i = 0; i < 100000; i++) {
                synchronized (lock) {
                    // CPU acquires lock (atomic operation at hardware level)
                    // Memory barrier: flushes CPU cache to RAM
                    counter++;
                    // Memory barrier: ensures write is visible
                    // CPU releases lock
                }
            }
        });

        Thread t2 = new Thread(() -> {
            for (int i = 0; i < 100000; i++) {
                synchronized (lock) {
                    counter++;
                }
            }
        });

        t1.start();
        t2.start();
        t1.join();
        t2.join();

        System.out.println("Counter: " + counter); // Always 200000
    }
}

OS-Level Operations:

Thread requests lock → OS/JVM checks lock status
If locked: Thread goes to BLOCKED state (not using CPU)
Lock owner releases → OS wakes waiting thread
synchronized creates memory barriers (CPU instruction)
Cache coherence protocol ensures all cores see updates

Example 4: Understanding Thread States

public class ThreadStatesExample {
    public static void main(String[] args) throws InterruptedException {
        Object lock = new Object();

        Thread t = new Thread(() -> {
            synchronized (lock) {
                try {
                    System.out.println("RUNNABLE -> CPU executing");
                    Thread.sleep(1000);
                    System.out.println("TIMED_WAITING -> Off CPU, in RAM");
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });

        System.out.println("NEW: " + t.getState()); // Thread object in heap
        t.start();
        System.out.println("RUNNABLE: " + t.getState()); // OS scheduled
        Thread.sleep(500);
        System.out.println("TIMED_WAITING: " + t.getState()); // Off CPU
        t.join();
        System.out.println("TERMINATED: " + t.getState()); // OS cleaned up
    }
}

Thread Lifecycle at OS Level

NEW (Java object in heap)
  ↓ start()
RUNNABLE (OS ready queue)
  ↓ OS Scheduler
RUNNING (Executing on CPU core)
  ↓ sleep()/wait()/I/O
WAITING/TIMED_WAITING (Off CPU, in RAM)
  ↓ notify()/interrupt()/I/O complete
RUNNABLE (Back to OS ready queue)
  ↓ Execution completes
TERMINATED (OS cleans up resources)

Key Takeaways

Java Thread = OS Thread: Direct 1:1 mapping with native threads
Context Switching: Expensive operation, save/restore state from RAM
CPU Cores: Limit true parallelism (8 cores = max 8 threads running simultaneously)
Memory Visibility: Changes in one core's cache might not be visible to others without synchronization
Thread Stack: Each thread gets private stack space in RAM (~1 MB default)
Shared Heap: All threads share heap memory, need synchronization
OS Scheduler: Decides which thread runs on which core and when

Understanding these concepts helps you write efficient concurrent programs and debug threading issues!

Table of Contents​

Thread Fundamentals​

What is a Thread?​

OS-Level Thread Management​

The Journey from Java to OS​

Thread Models​

CPU and Thread Execution​

How CPU Executes Threads​

Context Switching​

Memory and RAM Interaction​

Thread Memory Layout​

Memory Visibility Problem​

CPU Cache and Memory Hierarchy​

Practical Examples​

Example 1: CPU-Bound Task​

Example 2: I/O-Bound Task​

Example 3: Proper Synchronization​

Example 4: Understanding Thread States​

Thread Lifecycle at OS Level​

Key Takeaways​

Table of Contents